126 research outputs found

    Hard to Cheat: A Turing Test based on Answering Questions about Images

    Full text link
    Progress in language and image understanding by machines has sparkled the interest of the research community in more open-ended, holistic tasks, and refueled an old AI dream of building intelligent machines. We discuss a few prominent challenges that characterize such holistic tasks and argue for "question answering about images" as a particular appealing instance of such a holistic task. In particular, we point out that it is a version of a Turing Test that is likely to be more robust to over-interpretations and contrast it with tasks like grounding and generation of descriptions. Finally, we discuss tools to measure progress in this field.Comment: Presented in AAAI-15 Workshop: Beyond the Turing Tes

    Spatio-Temporal Image Boundary Extrapolation

    Get PDF
    Boundary prediction in images as well as video has been a very active topic of research and organizing visual information into boundaries and segments is believed to be a corner stone of visual perception. While prior work has focused on predicting boundaries for observed frames, our work aims at predicting boundaries of future unobserved frames. This requires our model to learn about the fate of boundaries and extrapolate motion patterns. We experiment on established real-world video segmentation dataset, which provides a testbed for this new task. We show for the first time spatio-temporal boundary extrapolation in this challenging scenario. Furthermore, we show long-term prediction of boundaries in situations where the motion is governed by the laws of physics. We successfully predict boundaries in a billiard scenario without any assumptions of a strong parametric model or any object notion. We argue that our model has with minimalistic model assumptions derived a notion of 'intuitive physics' that can be applied to novel scenes

    Ask Your Neurons: A Neural-based Approach to Answering Questions about Images

    Full text link
    We address a question answering task on real-world images that is set up as a Visual Turing Test. By combining latest advances in image representation and natural language processing, we propose Neural-Image-QA, an end-to-end formulation to this problem for which all parts are trained jointly. In contrast to previous efforts, we are facing a multi-modal problem where the language output (answer) is conditioned on visual and natural language input (image and question). Our approach Neural-Image-QA doubles the performance of the previous best approach on this problem. We provide additional insights into the problem by analyzing how much information is contained only in the language part for which we provide a new human baseline. To study human consensus, which is related to the ambiguities inherent in this challenging task, we propose two novel metrics and collect additional answers which extends the original DAQUAR dataset to DAQUAR-Consensus.Comment: ICCV'15 (Oral

    Comment le roman peut-il étre «wagnérien»? Le cas d'Élémir Bourges

    Get PDF
    The centre of attention of the paper is the connections between the novel Le Crépuscule des dieux (1884) by Bourges and Wagnerian cycle of the Ring: their presence is traced in the subject matter of the work, at the level of character construction and composition. The opinion of the author is that Bourges is the most complete example, illustrating the influence of Wagnerian esthetics in the field of French novel

    „Théodore de Banville: artysta czy rzemieślnik?”

    Get PDF
    This paper puts forward a reflection upon the status of Theodore de Banville in the literary Pantheon by making an attempt to determine the proportions of art and artisanry in his poetic work. While Banville deserves to be referred to as an artist, providing the term is based on the 19th century definition, the name of an artisan, so often used with reference to his person by critics and literary historians might also be justified. Looking at Banville’s work more closely, analyzing the role of his creative inspiration and patiently built text structure, observing the importance attached by the author to the versification techniques, his love for technical difficulty, his meticulousness and his writing skills, but also the frequency with which he creates images and metaphors referring to what was previously called ‘mechanical arts’, long story short, taking into account everything that falls under the category of poetic techniques, the answer to the question posed in the title of the paper, whether Banville was an artist or an artisan, is that he was both.Artykuł proponuje refleksję nad statusem Théodora de Banville w literackim Panteonie i charakterystykę jego twórczości poetyckiej poprzez próbę ustalenia proporcji zachodzących w niej między sztuką a rzemiosłem. O ile bowiem miano artysty przysługuje niewątpliwie poecie na mocy dziewiętnastowiecznych definicji tego pojęcia, to miano rzemieślnika, tak często nadawane mu przez krytyków i historyków literatury, także ma swoje uzasadnienie. Przyglądając się dziełu Banville’a uważniej, badając w nim udział twórczej inspiracji i cierpliwej konstrukcji tekstu, obserwując wagę przywiązywaną przez autora do zagadnień techniki wierszopisarskiej, jego umiłowanie trudności, jego skrupulatność, jego zręczność wykonawcy, ale także częstotliwość, z jaką pojawiają się pod jego piórem obrazy i metafory odsyłające do tego, co dawniej nazywano „sztukami mechanicznymi”, krótko mówiąc, uwzględniając wszystko to, co składa się na warsztat poety, na tytułowe pytanie, czy Banville jest artystą, czy rzemieślnikiem?, odpowiedź brzmi: zarówno jednym, jak i drugim

    «L'Idéal sous les voiles de l’ électricité». A propos de «L'Ève future» de Villiers de l ’Isle-Adam

    Get PDF
    The article discusses an extremely idealistic message of the 1886 novel L ’Ève future by Villiers de l’lsle-Adam. The motif of an artificial woman constructed by a genius scientist is a pretext for philosophical and metaphysical meditation on human fate, an attempt to reach - through art - the deepest mysteries of the spiritual world set against shallow visible reality. The subject matter and rhetorics of the work is an expressive illustration of symbolism in the field of French novel

    Adapting Visual Question Answering Models for Enhancing Multimodal Community Q&A Platforms

    Full text link
    Question categorization and expert retrieval methods have been crucial for information organization and accessibility in community question & answering (CQA) platforms. Research in this area, however, has dealt with only the text modality. With the increasing multimodal nature of web content, we focus on extending these methods for CQA questions accompanied by images. Specifically, we leverage the success of representation learning for text and images in the visual question answering (VQA) domain, and adapt the underlying concept and architecture for automated category classification and expert retrieval on image-based questions posted on Yahoo! Chiebukuro, the Japanese counterpart of Yahoo! Answers. To the best of our knowledge, this is the first work to tackle the multimodality challenge in CQA, and to adapt VQA models for tasks on a more ecologically valid source of visual questions. Our analysis of the differences between visual QA and community QA data drives our proposal of novel augmentations of an attention method tailored for CQA, and use of auxiliary tasks for learning better grounding features. Our final model markedly outperforms the text-only and VQA model baselines for both tasks of classification and expert retrieval on real-world multimodal CQA data.Comment: Submitted for review at CIKM 201

    Long-Term Image Boundary Prediction

    Full text link
    Boundary estimation in images and videos has been a very active topic of research, and organizing visual information into boundaries and segments is believed to be a corner stone of visual perception. While prior work has focused on estimating boundaries for observed frames, our work aims at predicting boundaries of future unobserved frames. This requires our model to learn about the fate of boundaries and corresponding motion patterns -- including a notion of "intuitive physics". We experiment on natural video sequences along with synthetic sequences with deterministic physics-based and agent-based motions. While not being our primary goal, we also show that fusion of RGB and boundary prediction leads to improved RGB predictions.Comment: Accepted in the AAAI Conference for Artificial Intelligence, 201